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Abstract 

Previous approaches to systematic state-space exploration for test- 
ing multi-threaded programs have proposed context-bounding fil^l 
and depth-bounding 1 5] to be effective ranking algorithms for test- 
ing multithreaded programs. This paper proposes two new metrics 
to rank thread schedules for systematic state-space exploration. Our 
metrics are based on characterization of a concurrency bug using 
v (the minimum number of distinct variables that need to be in- 
volved for the bug to manifest) and t (the minimum number of 
distinct threads among which scheduling constraints are required 
to manifest the bug). Our algorithm is based on the hypothesis 
that in practice, most concurrency bugs have low v (typically 1- 
2) and low t (typically 2-4) characteristics. We iteratively explore 
the search space of schedules in increasing orders of v and t. We 
show qualitatively and empirically that our algorithm finds com- 
mon bugs in fewer number of execution runs, compared with pre- 
vious approaches. We also show that using v and t improves the 
lower bounds on the probability of finding bugs through random- 
ized algorithms. 

Systematic exploration of schedules requires instrumenting 
each variable access made by a program, which can be very ex- 
pensive and severely limits the applicability of this approach. Pre- 
vious work (SLUl] has avoided this problem by interposing only on 
synchronization operations (and ignoring other variable accesses). 
We demonstrate that by using variable bounding (v) and a static 
imprecise alias analysis, we can interpose on all variable accesses 
(and not just synchronization operations) at 10-100x less overhead 
than previous approaches. 

Categories and Subject Descriptors D.2.4 [Software Engineer- 
ing]: Software/Program Verification — formal methods, validation; 
F.3.1 [Logics and Meanings of Programs]: Specifying and Veri- 
fying and Reasoning about Programs — mechanical verification, 
specification techniques; D.2.5 [Software Engineering]: Testing 
and Debugging — debugging aids, diagnostics, monitors, tracing 

General Terms Algorithms, Reliability, Verification 

Keywords Concurrency, context-bounding, variable-bounding, 
thread-bounding, model checking, multi-threading, concurrency- 
bug classification, shared-memory programs, software testing 



1. Introduction 

Testing concurrent programs is notoriously difficult because of its 
inherent non-determinism. An effective but expensive approach is 
model-checking, where all possible schedules of a program are ex- 
ecuted to ascertain the absence of a bug. Unfortunately, the space 
of all schedules is huge, and exhaustively enumerating it is usu- 
ally infeasible. For a multi-threaded program with n threads, each 
executing k instructions, the total number of schedules (or thread 



interleavings) is 



( nk)i 



This space of schedules further explodes if 
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each instruction is not guaranteed to be atomic. For a very small 
program with k — 100 and n — 2, the total number of interleav- 
ings is around 10 59 ! 

As it is practically impossible to exhaustively explore the entire 
state space of all schedules for any useful program, an alternative 
is to try and maximize the probability of uncovering a bug rather 
than trying to ascertain its absence. Many different approaches have 
been proposed in this direction. Musuvathi and Qadeer proposed 
using context-bound to rank schedules, and show that it is an effec- 
tive method to uncover most common bugs 1131 . A context-bound 
is the number of pre-emptive context-switches required to execute 
a schedule. The schedules are enumerated in increasing order of 
their context-bound, i.e., all schedules with context bound c — 1 are 
executed before any schedule with context bound c. Musuvathi and 
Qadeer report experiments on real-world applications, and show 
that all known bugs in those applications were found at context- 
bound values of 2 or less. 

Iterative context bounding is an effective way of ranking sched- 
ules. However, this metric is often too coarse-grained. For a multi- 
threaded program with n threads, each executing k instructions, the 
total number of schedules at context-bound c grows with (nk) c . 
For a small program with k = 10, 000 instructions and n = 4, 
the number of schedules at context bound 2 is on the order of 
10° ! Musuvathi et. al's concurrency-testing tool based on this al- 
gorithm, CHESS, reduces this search space by considering only 
explicit synchronization operations as possible pre-emption points, 
thus reducing k by at least 2-3 orders of magnitude. This simpli- 
fication is justified by the assumption that most programs follow a 
mutual-exclusion locking discipline, and hence all shared-memory 
accesses will be protected by lockO and unlock () calls. Viola- 
tion of this locking discipline can be separately checked using other 
race-detection tools. This approach, though effective, is not com- 
pletely general, as many systems deliberately avoid explicit syn- 
chronization 1 27], often for performance reasons. 

Another approach to testing multithreaded programs is random- 
ization of scheduling decisions with probabilistic guarantees. Bur- 
ckhardt et. al. [5] characterize a concurrency bug by its depth — 
the minimum number of scheduling constraints required to find the 
bug. They provide an algorithm that provides a lower bound on the 
probability of finding a depth-d bug. Ranking on bug-depth d re- 
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stricts the search space of a multi-threaded program with n threads 
and executing k instructions to nfc d_1 . This, again, may be too 
large for most programs. 

Another recent tool, CTrigger l22ll . focuses on atomicity- 
violation bugs and preferentially searches the space of schedules 
that are likely to trigger these bugs. CTrigger first profiles execu- 
tions of the program to determine the shared variables and their 
unprotected accesses. It then attempts to generate schedules that 
are likely to violate assumptions of atomicity (for example, by in- 
serting a write to location M by some thread between two accesses 
to the same location M by another thread). CTrigger is primarily 
interested in atomicity-violation bugs and often overlooks other 
concurrency bugs. 

Our first contribution is to propose the use of number of vari- 
ables to further classify and reduce the schedule search space. Our 
algorithm is based on the hypothesis that in practice, most concur- 
rency bugs can be uncovered by restricting our search to only a few 
variables at a time. At a time, we only search for bugs involving a 
small subset of v variables. These variables may include synchro- 
nization operations. Iteratively, we consider all such variable sub- 
sets. For a given subset of variables, we perform static alias anal- 
ysis to identify all program locations where these variables may 
be accessed. We instrument only these program locations. This se- 
lective instrumentation allows us to run our program at near-native 
speed. Consequently, our approach can interpose on any variable 
accesses, and not just synchronization variables as reported in pre- 
vious work. We show that using variable bounding, the search space 
reduces by a factor of roughly (^) c ~" when searching for bugs 
with context-bound c and variable bound v, where Q is the total 
number of variables in the program. We confirm this result experi- 
mentally by showing that variable bounding allows faster discovery 
of concurrency bugs. 

Our second contribution is characterizing a concurrency bug by 
the number of distinct threads that need to be order-constrained 
to uncover the bug. A bug that can be uncovered by constraining 
the order of t threads is called a i-fhread bug. In practice, most 
bugs have a small t. We provide a randomized algorithm with 
guarantees on the probability of uncovering a t-thread bug, if it 
exists. Using thread-bounding, the search space decreases by a 
factor of ( t+1 )" ; ! og („) when searching for bugs with thread-bound 
t out of a total of n program threads. 

We note that our hypothesis that most bugs can be uncovered at 
low (v, t) values conform with the observations made in previous 
work on studying real-world concurrency bug characteristics 1 16]. 

The paper is organized as follows. Section [2] presents and ana- 
lyzes variable bounding for exhaustive model-checking algorithms. 
Section [3] discusses variable bounding for randomized algorithms 
and analyzes the resulting probabilistic guarantees of finding a bug, 
if one exists. Section[4]discusses thread bounding. Sections |5]and[6] 
discuss our implementation and empirical results. Section [7] dis- 
cusses related work, and Section[8]concludes. 

2. Variable Bounding 

Recent work on studying characteristics of real-world concurrency 
bugs 1 16] concluded that 66% of the non-deadlock concurrency 
bugs they examined involved only one variable. Perhaps, the most 
common type of concurrency bug involving one variable access is a 
data race, i.e., simultaneous access of a shared variable (of which, 
one is a write) by two or more threads without proper synchro- 
nization. Also, among the remaining fraction of non-deadlock con- 
currency bugs, most bugs involve only a few variables (typically 
2 to 3). This observation motivates our ranking on the number of 
memory locations involved. We first enumerate schedules that ex- 
haustively check all thread interactions involving a single variable. 



We then enumerate schedules that exhaustively check thread inter- 
actions involving two variables, and so on. 

We first discuss variable bounding in the context of a model- 
checker. For a model-checker like CHESS 1 20], a custom priority 
scheduler implements the exhaustive enumeration of schedules, 
and context-bounding 1 19] is used to limit the number of schedules 
executed. To implement variable bounding, we first identify all 
program variables (or points in the program that generate new 
variables) by parsing the program. These program variables include 
globals and heap-allocated variables (allocated using malloc () or 
new). A heap variable is identified and named by its allocation 
statement and the number of times that statement has been invoked. 
For example, if a particular new statement is called multiple times, 
we will consider each return value as a separate variable. We call 
this set of program variables #. Iteratively, we take all u-sized 
subsets of variables in ■& for v £ {1, 2, 3, . . .}. For a subset V of 
size v, we execute schedules that explore all interactions between 
all variables in V. 

To identify variables, we instrument heap allocation statements 
to generate a new variable name for each invocation of the state- 
ment. As we explain later, we also prioritize the variables which 
are generated in the first few loop iterations. To identify interac- 
tions between a subset of variables, we instrument accesses to these 
variables. We use a lightweight and imprecise static alias analy- 
sis Hl [T^ . l26Tl to identify program points at which each variable in 
•d may be accessed. Our static analysis assumes that the program is 
memory-safe, i.e., locations outside allocation boundaries will not 
be accessed. Memory-safety can be separately checked using other 
available tools. 

Without variable bounding, all accesses to all variables must be 
instrumented with a call to the scheduler which implements exhaus- 
tive schedule enumeration. With variable bounding, this instrumen- 
tation can be significantly reduced. For a variable Xi £ we call 
the set of program locations at which it may be accessed a Xi . With 
variable bounding, we only check interactions within a variable 
subset V = {a;o, xi, . . . , x v }, and instrument all locations in the 
set (a Xo UOk! U- • • Ua Xv ). The instrumentation code includes a call 
to a scheduler function, varaccessO that yields to the scheduler 
which implements priority scheduling and systematic pre-emption. 
varaccessO is inserted after the program has accessed and pos- 
sibly updated the variable. To ensure that pre-emption occurs only 
on accesses to the set of tracked variables, the instrumentation code 
dynamically checks that the accessed memory address is one of the 
tracked variables before calling varaccessO. The varaccessO 
call serves as a potential yield point (or context-switch point), i.e., 
at this point, the scheduler can choose to run another thread. To al- 
low a thread to be pre-empted before its first access to a variable, 
we also insert a fake varaccessO before the first instruction of 
each thread. Our enumeration algorithm is similar to that used in 
CHESS 12011 and we discuss it in Section[5] 

Bug Characterization 

We call a concurrency bug a c context bug if at least c pre-emptive 
context switches are required for the bug to manifest, c is also called 
the bug's context-bound. This definition of context bound is taken 
from previous work 1131 . 

We call a concurrency bug a u-variable bug if the minimal set of 
constraints required to manifest the bug involve preemption points 
at accesses to v distinct variables, v is also called the bug's variable 
bound. By definition, v < c for any c, v bug. 

Figures [T] 121 [U E] show short programs with (c = 0, v = 0), 
(c = 1, v = 1), (c = 2, v = 1), (c = 2, v = 2) bugs respectively 
for exposition. In these short programs, we count a pre-emption 
against the shared variable that was last accessed. Also, we assume 
that a bug exists if the ASSERT statement can fail. 
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a = 



Thread 1: 
ASSERT(a == 0); 



Thread 2: 
a++; 



Figure 1. A short program with a c = 0, v — bug 



a = 



Thread 1: 
tl = a; 
t2 = a; 

ASSERT(tl==t2); 



Thread 2: 
a++; 



Figure 2. A short program with a c = 1, v = 1 bug 



a = 



Thread 1: 
tl = a; 
t2 = a; 

ASSERT(tl==t2); 



Thread 2: 
a= 1; 
a = 0; 



Figure 3. A short program with ac= 2,u = 1 bug 



merited. We study both these improvements in detail in our experi- 
ments in Section[6] 

At v = c, variable bounding provides no improvement in the 
size of the search space. But, a significant reduction in runtime 
happens because of much lower instrumentation overhead; only 
the tracked variables need to be instrumented now. Effectively, by 
slicing the program into accesses to a small subset of variables, 
we reduce the number of program steps k. This is because only 
accesses to the variable being tracked are considered valid context 
switch points. As we discuss in our experiments (Section [§}, this 
reduction is significant for most programs. This method of reducing 
k is more general than the approach used in previous tools (e.g., 
CHESS [20]) where all accesses to non-synchronization variables 
are ignored. 

While we have used a simplified assumption of constant number 
of accesses d to each variable by each thread, the result does not 
change (although the analysis gets more involved) if we assume 
varying number of accesses by each thread to different variables. 
The same result can be obtained by replacing d with the average 
number of accesses by a thread to a randomly-chosen variable, and 
we skip this discussion for brevity. We analyze a more general 
scenario in our discussion on probability bounds for randomized 
bug-finding algorithms (Section|3j- 



a = 0, b = 
Thread 1 : Thread 2: 

tl = a; a= 1; 

t2 = a; b=l; 
t3 = b; b = 0; 

ASSERT(tl==t2ort3 != 1); 

Figure 4. A short program with a c = 2, v — 2 bug 



Schedule Characterization 

A schedule is characterized by the number of pre-emptive context 
switches (c) in it. We further characterize a schedule by v — the 
number of distinct variables at which a pre-emptive context switch 
was performed. By definition, for any schedule, v < c. 

Search Space Reduction 

We now discuss how variable bounding helps reduce the search 
space. Let us assume that a multi-threaded program with t threads 
has Q distinct shared variables, represented as a set # of variables, 
i.e., |#| = Q. For simplicity, let us also assume that each thread 
in the program accesses each variable in i? exactly d times. Hence, 
the total number of variable accesses by a thread are dQ. Assuming 
that only accesses to these shared variables are interesting context- 
switch points, and assuming n threads, k = ndQ (k is the number 
of steps in a program). Therefore, the number of schedules that 
need to be explored at context bound c are 0((ndQ) c ). Let us call 
this expression A. 

If we focus on a subset V C of v variables, the num- 
ber of schedules that need to be explored at context bound c are 
(®)(ndv) c (first choose a subset V C then explore all sched- 
ules with preemptions at accesses to variables in V). Assuming 
v,c <C Q, this expression is 0(Q V (ndv) c ). Comparing with A, 
we see that this expression is less than A if v < c. This reduction in 
the search space (number of execution runs) is significant for pro- 
grams with a large number of variables (large Q). Apart from this 
reduction in the number of execution runs, the time taken by each 
execution run also decreases dramatically with variable bounding, 
as only the accesses to variables being tracked need to be instru- 



Heap Allocated Variables and Arrays 

Our set of tracked variables include heap-allocated variables. Heap- 
allocated variables are named using the heap-allocation statement 
and the number of times that statement was executed before this 
variable was generated. A large number of heap allocations by one 
statement can generate a large number of variables causing our 
variable-bounding algorithm to get stuck at low v values. 

In our experience, if the program contains a bug involving 
a certain type of heap variable, the bug usually manifests while 
tracking the first few variables of that type. For example, if the 
program constructs and accesses a heap data structure (e.g., linked 
list), it is very likely that a bug, if it exists, will be exposed by 
exploring all interactions among the first few elements of that data 
structure. 

The challenge is to identify and group variables of a certain 
type, so that only the first few variables of that type are considered. 
We use a simple heuristic that we found to work well in practice. 
The type of a variable is defined by the callstack at the time of al- 
location of that variable. We expect that largely, variables allocated 
with identical callstacks are of the same type. This heuristic is nei- 
ther sound nor complete. For example, it is possible that variables 
of the same type are allocated at different points in the program, 
hence having different callstacks. This can cause our algorithm to 
execute more than the required number of schedules. A more se- 
rious problem is that two identical callstacks could generate com- 
pletely different types of variables. This can cause our algorithm to 
overlook certain bugs. Fortunately, in practice, such code is rare. 

The algorithm works as follows. For each heap allocation, we 
generate a new variable ID labeled by the location of the allocation 
statement and the number of times that statement was executed. 
With each variable ID, we also associate the number of times this 
allocation statement has previously been executed with an identical 
callstack. We call this latter number, the loop iteration number (be- 
cause the allocations with identical callstacks must be happening 
through a loop) of that variable. We first search for bugs involving 
variables with lower loop iteration numbers before searching for 
bugs involving variables with higher loop iteration numbers. We 
call this algorithm loop-iteration bounding and denote the current 
loop-iteration number being searched with letter I. Figure [5] shows 
our logic for implementing loop iteration numbers. 
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instrumentation code for new()> 
callstack := get_current_callstack() ; 
v := <heap-allocation-statement , alloc#>; 
lin := loop_iteration_mimber (callstack) ; 
increment_loop_iteration_number (callstack) ; 
if (lin <= { 
add v to the set Q of the variables to be tracked; 

} 



Figure 5. Instrumentation code for heap-allocation statements that 
considers only variables with loop-iteration number < I. 

We also need special handling for array variables. Whenever 
possible, we treat each location in the array as a separate variable. If 
the search space size becomes unmanageably large (for high values 
of v), we use a less precise but sound approach of considering the 
whole array as a single variable. 

3. Variable Bounding on Randomized Algorithms 

Apart from exhaustive state space exploration to ascertain the ab- 
sence of certain bugs, randomized schedulers that provide proba- 
bilistic guarantees of finding certain types of bugs have also been 
proposed. Depth-bounding |5] (also called Probabilistic Concur- 
rency Testing in the paper) is one such approach. The primary ad- 
vantage of randomized approaches over exhaustive search is that 
the former can cover a large part of the program in relatively fewer 
runs. Exhaustive search, on the other hand, can get stuck in local 
regions of the program for long periods of time causing bugs in 
other regions to go undetected. In this section, we discuss variable 
bounding in the context of randomized search. 

In particular, we study Probabilistic Context Bounding (PCT) [5] 
that proposed the bug-depth metric. While we analyze only PCT, 
similar arguments will hold for other randomized algorithms. For 
a program spawning at most n threads and executing at most k 
total instructions, PCT algorithm works as follows (for an input 
parameter d, denoting the depth of the bug being searched): 

1. Assign n priority values d, d + 1,. . . , d + n randomly to the n 
threads. 

2. Pick d — 1 priority change points ki,. . . , fcd-i randomly in the 
range [1, fe], Each ki has an associated priority value of i. 

3. Schedule the threads by honoring their priorities, i.e., always 
execute an enabled thread with the highest priority. When a 
thread reaches the i-th change point (i.e., executes the fei-th 
instruction), change the priority of that thread to i. 

Burckhardt et. al. 1 5] proved that this algorithm finds a bug of depth 
d with probability at least l/nfc d_1 . 

We implement variable bounding on PCT by first randomly 
choosing a set of v variables, and then randomly choosing d — 1 
priority change points at one of the accesses to the chosen variables 
(other instructions in the program are not considered as potential 
priority change points). For heap-allocated variables, we simply 
choose a heap allocation statement (new and malloc) in lieu of 
a variable. Accesses to any of the variables allocated at the chosen 
heap-allocation statement are considered potential priority change 
points. 

Notice that using a heap-allocation statement as one "variable" 
in the randomized algorithm is a departure from the strategy used 
in the exhaustive-search strategy, where each heap allocation is 
considered a separate variable. This is done to ensure that we 
know the number of these variables at compile time, and hence 
can appropriately choose a variable set to provide probabilistic 
guarantees. Under this new definition of a variable, a w-variable 



bug is a bug that involves memory locations allocated at at most v 
distinct heap-allocation statements (or globals). This new definition 
performs a coarser classification of program's memory locations. 
This could potentially cause higher number of required executions 
for effective state space search for the same v value. However, this 
is still a significant improvement over not using variable bounding 
at all. Also, this definition of variable bounding does not make our 
argument on most bugs having low variable bounds any weaker. 

Assume that the total number of global variables and heap allo- 
cation statements in a program is Q. We change the PCT algorithm 
to implement variable bounding as follows: 

0. Choose a set of v variables qi,, . , ,q v representing the minimal 
set of variables involved in the bug being searched (v < d). 

1. Assign n priority values d, d + 1,. , . , d + n randomly to the n 
threads. 

2. Let k qi ,. . . ,k qv denote upper-bounds on the number of instruc- 
tions accessing qi,... q v respectively in any run of the program. 
Hence, variable q r is accessed at most k qr times in any ex- 
ecution of the program. Construct a set S of elements of the 
form (q r ,j), where r is in the range [1, v] and j is in the range 
[1, kq r ]. The set S will have k = =i v ^tr elements. Pick 
d — 1 random elements from S to represent the priority change 
points. 

3. Schedule the threads by honoring their priorities. For 2-th 
chosen element (q ri ,ji) in the previous step, force a priority 
change point at the jith access of the q ri th variable, i.e., change 
the priority of the thread at this point to i. 

We call this modified algorithm PCTVB (PCT with Variable 
Bounding). Unlike PCT, where the priority change points ki& are 
chosen randomly from 1, . . . , k, PCTVB first picks a set of vari- 
ables (or heap-allocation statements), and then chooses priority 
change points among the accesses to this set. PCTVB has two ad- 
vantages over PCT: 

1. As we show below, PCTVB improves the probability bound on 
finding a bug with depth d and variable-bound v. Because most 
bugs have low v, this results in overall improvement in the bug- 
finding probability. 

2. Choosing a set of variables apriori allows us to instrument 
only the program points that can potentially access these vari- 
ables. These program points are identified using static (impre- 
cise) alias analysis. This is a significant improvement over PCT 
where all variable accesses need to be instrumented. 

Probabilistic Guarantees with Variable Bounding 

The analysis of the probabilistic guarantees of PCTVB is identical 
to that of PCT, as presented in the original paper |5] and we omit 
it for brevity. We simply revisit Theorem 9 (without proof) of the 
original paper with our new variable bounding enhancement. 

THEOREM 3. 1 . Let P be a program with a bug B of depth d 
and qi,. . . ,q v be the minimal set of unique variables, accesses to 
which need to be preempted to trigger B. For a variable qi, let 
k qi > maxaccesses(P,qi). Assuming n > maxthreads(P), 

Pr[PCTVB(n, k, d,qi,..., q v ) G B] > -= 1 

n \l^i=\..v k 1i> 

Here, B is the set of schedules that expose the d-depfh bug in 
the program. maxaccesses(P,qi) returns the maximum num- 
ber of accesses made by P to variable qi in any single run. 
maxthreads(P) is the maximum number of threads spawned 
in P. The proof is identical to that of the original theorem, and is 
obtained by simply replacing k with X^:=i v ■ 
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For a program with Q total global variables and heap allocation 
statements, the probability that we pick the correct v variables 
(gi,. . . ,q v ) to trigger the w-variable bug (if it exists) is -^p-. Hence, 



the probability of finding the u-variable bug is 
Pr[PCTVB(n, k, d,v) G B] > 



This expression depends on the sum of the access frequencies 



. ,k qv of the variables gi, 



Given that the total number 



of variables is Q, and the total number of variable accesses in any 
single run is at most k — k qi , we expect this sum to be 
less than ^ on average (averaged over all u-sized sets of global 
variables and heap allocation statements). Let us assume that the 
sum is /^j where / < 1 on average but could be higher depending 

on the set of chosen variables. Upper-bounding (®) with 0(Q V ) 
for small values of v, this expression evaluates to 



Pr[PCTVB(?i,fc,d,ii) 6 B] > 



Q 



n(kvf)° 



Comparing this with PCT's original bound of nk d-\ , variable 
bounding helps if 

Q d - V ~ 1 > {vff- 1 

Assuming v <C Q, variable bounding significantly improves the 
lower bound on probability if v < d — 1 and / is small. In other 
words, variable bounding helps if the bug being searched involves 
fewer variables than its bug depth, and these variables are accessed 
less than average access frequencies. 

A case of particular interest are bugs with variable bound v = 1, 
as they are by-far the most common. The inequality shows that 
the probability of finding a 1-variable bug of depth 2 or higher 
improves significantly if / < 1. In other words, the probability 
of finding bugs involving "corner variables" (variables used rarely 
compared to others) improves with variable bounding. Intuitively, 
variable bounding gives all variables an equal chance, while plain 
depth-bounding (or context-bounding) gives higher chance to more 
frequently-accessed variables. We confirmed this experimentally 
by writing a small program with two variables and varied the rel- 
ative access frequencies of the variables. One of the two variables 
was involved inac=l,« = l,t = 2 concurrency bug. Figure [6] 
shows that as the frequency of access to the variable containing the 
bug is decreased, PCTVB requires fewer executions to find the bug 
compared to PCT 
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Figure 6. Figure represents the number of executions required (on 
average) to trigger the bug for PCT and PCTVB as the access 
frequency of the buggy variable is changed. 

We profile the access frequencies of variables in different pro- 
grams in Figure [7] The details of these programs are given in Ta- 
ble Q] Clearly, the number of accesses varies widely across differ- 
ent variables for almost all benchmarks. Typically, we expect vari- 



ables with fewer accesses to undergo relatively less testing and thus 
have higher likelihood of having bugs. Even if we assume that all 
variables are equally likely to contain bugs, we see that variable 
bounding improves the overall probability of finding a bug (if one 
exists). We present a simple example. Consider a program with a 
v = 1, d = 2 bug that manifests if a certain priority sequence is 
followed and priority change point occurs on a certain access a qb 
to variable qb- Assume there are Q different variables in the pro- 
gram, and each variable qi is accessed at most k qi number of times 
in any one run of the program. Hence the probability of uncovering 
the bug is the probability that we pick the correct priority sequence, 
and the probability that we choose a qb as the lone priority change 
point. The former is independent of variable bounding. Below, we 
compare the latter, with and without variable bounding. 

Without variable bounding, the probability of picking a qb as a 
priority change point is at least ^ 1 , — (let's call this expression 

El). This expression is simply the probability of choosing a qb 
among ^ q . k qi potential priority change points. Notice that El 
is independent of k qb . 

With variable bounding, we first choose a variable and then 
choose an access point of that variable. Hence, the probability that 



we pick a qb as a priority change point is > 



Q-fc, 



(the probability 



that we pick qt multiplied by the probability that we pick a qb ). 
This expression depends on qb and k qb . Assuming each variable 
is equally likely to contain a bug, further computing the expected 
value of this expression over all qb, we get -k Y] ttc — (let's call 

° <V — 'Qb W K q b 

this expression E2). 

Comparing El and E2, and using Jensen's inequality, we get 



1 



< 



O ^ Ok, 



y^* qi kn Q Qk< 

or El < E2 with the equality happening only at k qo = k qi =,.,= 
k qQ . For typical access patterns to variables in common programs 
(see Figure|7J, E2 is expected to be significantly higher than El. 
Hence, assuming all variables are equally likely to have a bug, 
variable bounding provides a tighter bound (EZ) on the probability 
of finding the bug at v = 1, d = 2. A similar argument holds for 
higher values of v and d, and we omit the discussion for brevity. 



4. Thread Bounding 

Previous work on studying concurrency bugs found that most con- 
currency bugs can be discovered by enforcing ordering constraints 
between a small number (typically two) of threads [16]. This is 
our inspiration for using thread-bounding while searching for con- 
currency bugs. We call a bug that requires ordering constraints be- 
tween at-least t distinct threads to be uncovered, a i-fhread bug. t is 
also called the thread-bound of the bug. By definition, the thread- 
bound of a concurrency bug is always 2 or higher. Notice that our 
definition of thread-bound also counts the threads that should not 
be executed for a bug to manifest. For example, a bug that man- 
ifests only if thread A is executed after thread B and thread C is 
not executed in between, will be called a 3-thread bug, and not a 2- 
thread bug. Also, t is independent of c and v. i.e., a c context-bound 
bug and a v variable-bound bug, can have any thread bound t > 2. 
Figures [8l l9l 1 1 01 show short programs with (c = 0, v = 0, t = 3), 
(c — l,u = l,t = 3), and (c = 2, v = 2, t = 3) bugs, respec- 
tively. 

We posit that the number of schedules required to uncover 
a t-thread bug increases with t. For example, for a program 
with n threads Ti,--- ,T n , at context-bound c = 0, all 2- 
thread bugs can be uncovered by only two schedules, namely 
\T-l,T<2,Tz, . . . ,T n -i,T n } and {T„,T„_i,T„_2, . . . ,T2,Ti}. 
This is because for any subset of 2 threads {Ti,Tj}, both orders 
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Figure 7. This figure plots the variable access frequency profile for six of our benchmarks. The values on the x-axis represent the frequency 
of access of a variable, and the y-axis plots the number of variables that are accessed at that frequency. For example, in tsp, 19 variables are 
accessed or 1 times (first vertical bar), and only 1 variable is accessed 2-3 times. These access frequencies were determined after running 
our benchmarks multiple times on different inputs and averaging the results. 



Thread 1: 
a++; 



Thread 2: 
a++; 



Thread 3: 
ASSERT(a!=2); 



Figure 8. A short program with ac = 0,t> = 0,i = 3 bug 



between Ti and Tj (i.e., {T z ,Tj} and {Tj,Ti}) are covered by 
these two schedules. In other words, if we arrange the threads in an 
arbitrary permutation, enumerating two orders (increasing and de- 
creasing) are enough to uncover all 2-thread bugs at context bound 
0. 

A similar argument holds for z>fhread bugs where t > 2. At 
c = 0, it suffices to enumerate enough schedules to explore all t\ 
relative orderings of all t-sized subsets of the n threads, to uncover 



6 



2012/7/12 



Thread 1: 
tl =a; 
t2 = a; 

ASSERT(tl < t2+l); 



a = 
Thread 2: 
a++; 



Thread 3: 
a++; 



Figure 9. A short program with a c = 1, 



a = 



Thread 1: 
tl =a; 
t2 = a; 
t3=b; 
t4 = b; 

ASSERT(tl==t2 or t3=t4); 



Thread 2: 
a++; 



1, t = 3 bug 



Thread 3: 
b++; 



Figure 10. A short program with a c — 2,v = 2,t — 3 bug 



a t-thread bug. To do this, we require an algorithm that generates 
enough permutations of n numbers, such that all t\ permutations of 
all t-sized subsets of the n numbers are exhaustively covered. 

Lemma |4~71 presents a randomized algorithm to enumerate all 
t\ permutations of all t-sized subsets of n numbers using less 
than 0((t + l)\log(n)) permutations of n numbers with a high 
probability. Notice that the algorithm has only logarithmic growth 
with n, as opposed to n\ growth without thread bounding. 

LEMMA 4. 1. The number of independent random permutations of 
n numbers that need to be generated to observe all t\ relative 
orderings of all f ™) subsets of size t with probability at least ( 1 — e), 
is{t+l)\{log{nt) + log{\)). 

Proof Let N be a set of n distinct elements. Consider a fixed subset 
ScNoft elements and let it be some arbitrary permutation of S. 
For any random permutation a of n elements, the probability that it 
is a subsequence of o is ^ (by argument of symmetry). Hence, the 
probability of tv not appearing in o is (1 — i), If we enumerate P 
independent random permutations of n numbers, the probability 
of 7r not appearing in any of the P permutations is (1 — j\) P ■ 
For a fixed permutation ir, let us denote this probability of n not 
appearing in any of the P permutations by F^. 

There are (") subsets of N of size t, each having t\ permuta- 
tions. Let us denote this set of permutations by O. The prob- 
ability that any one of the permutations in S is not observed in P 
random permutations of n numbers is upper-bounded by the sum of 
individual probabilities ~}2 F^ — tlCT\F„. We require this quan- 



tity to be less than e. 



(1 



< e 



Writing P as (t\Q), and approximating (1 — j,) t! by 

«(;) ( i>°<< 

Approximating t\ by i*, and (™) by n', 



> tlog(nt) + log(-) 



Replacing Q with P, 



P > (t + l)\(log(nt) + log(-)) 

e 



Remark: Even if e is inverse-exponential in n, P is still linear in n. 



As an example, given a maximum of n threads, at t = 3, it 
suffices to enumerate (24log(n)) random permutations of the n 
numbers to observe all 3! relative orderings of all Q) subsets with 
high probability. For n = 600, we found using simulations that 
70, 360 and 2000 random permutations were enough to generate 
all relative orders of all (™) (t = 3), (™) (t = 4) and (™) (t = 5) 
subsets respectively, with more than 99% probability. 

To generalize to higher context-bounds, we consider a pre- 
empted thread as two distinct threads (thread fragments) in this 
algorithm. Hence, for context-bound c bugs on a program with 
at most n threads, we consider n + c distinct thread fragments. 
To cover all i-fhread bugs at c context-bound, it suffices if we 
enumerate all (t+c) ! permutations of all (i+c)-sized subsets of the 
n + c thread fragments. (This is more than what is strictly required 
because here we are also enumerating orderings between thread 
fragments belonging to the same thread). Hence, using Lemma l4~Tl 
the number of schedules that need to be executed before all t-thread 
bugs have been tested at context bound c with high probability is 
0{(t + c+l)Hog(n + c)). 

To summarize, the exploration algorithm works as follows. A 
random permutation of 1, . . . , (n + c) numbers is generated at the 
start of each execution run. Let us label the generated permutation 
Pi, ... , P n +c- The scheduler uses strict priority scheduling using 
Pi, . . . , P n as the priorities of threads 1, . . . , n respectively. On the 
ith pre-emptive context switch, the priority of the running thread 
is changed to P n +i- If (t + c + l)\log(n + c) such executions 
are performed, each time with a new random permutation, we 
expect all t thread bugs at context bound c to be covered with 
a high probability. (If variable bounding is also being used, then 
this is repeated for each set of variables). Notice that the algorithm 
is independent of t; we only provide probabilistic guarantees on 
the absence of bugs with thread-bound less than t after a certain 
number of schedules have been executed. 

5. Implementation 

We implement variable and thread bounding in a concurrency 
testing tool for lava, called RankChecker. RankChecker instru- 
ments the binary class code of a Java program and associated li- 
braries to insert appropriate schedule points. It does not require 
any source-level annotations. We instrument Java bytecode using 
the Javassist library |6]. The instrumented test program is linked 
with a RankChecker library that implements a scheduler to dictate 
the thread interleavings. We implement static alias analysis using 
BDDs, similar to that used in I2lll2al . Like previous approaches 
on systematic and probabilistic testing I51 I20I1 . the program under 
test is required to be terminating, so that it can be run repeatedly to 
explore different schedules. It is usually straightforward to convert 
a non-terminating program to a terminating program. 

We implement two different algorithms: exhaustive and ran- 
domized. The exhaustive algorithm searches the state space of all 
schedules systematically. The randomized algorithm searches the 
state space randomly, with probabilistic guarantees on the proba- 
bility of finding a bug of certain type (e.g., depth). 

We first discuss the implementation of the exhaustive search 
strategy. The pseudo-code is shown in Algorithm \T\ The algo- 
rithm is invoked for each set of variables (determined using variable 
bounding). For each set of variables, a set of thread priority orders 
threadPrios are generated and executed. Strict priority schedul- 
ing is followed (line 28) and priorities are changed at variable ac- 
cesses using the thread bounding algorithm (line 37). 

A program state s is identified by the partial thread schedule 
that was executed. We implement a simple record-replay mecha- 
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Algorithm 1 Iterative context bounding algorithm for t-thread bugs 
Input: initial state so £ State. 

1 struct Workltem { State state; Priorities prio; } 

2 Queue<Workltem> WorkQueue; 

3 Queue<Workltem> nextW orkQueue; 

4 Workltem w; 

5 Queue<Priorities> threadPrios; 

6 threadPrios. init(t); 

7 int currBound: — 0; 

8 for prio £ threadPrios do 

9 u>orfcQ?ie?ie.Add(Workltem (so, prio)); 

10 end for 

11 while true do 

12 while -^workQueue. Empty {) do 

13 w := u>orfcQueue.PopFront(); 

14 Search(ui); 

15 end while 

16 if nextW orkQueue. Empty() || currBound == c then 

17 Exit(); 

18 end if 

19 currBound := currBound + 1; 

20 workQueue := nearfWorfcQ-ueue; 

21 nextWorkQueue.C\ear(); 

22 end while 

23 function Search(Workltem w) begin 

24 Workltem x; 

25 State s; 

26 TID effTid; 

27 bool tidenabled, varaccess; 

28 Thread iid := highestPriorityEnabledThread(™.prio); 

29 s := w. state. Execute(tid); 

30 tidenabled := {tid £ enabled(s)); 

31 varaccess := (tid returned due to varaccess ()) ; 

32 a: := Workltem(s, w.prio); 

33 Search(x); 

34 if (tidenabled && varaccess) then 

35 // pre-emptive cswitch. gen a schedule 

36 effTid := effTidOfCurrentThread(); 

37 changeEffTidOfCurrentThread(e//Tid+MaxT/ireads); 

38 x := Workltem(s, prio); 

39 neztWorfcQueue. Push(a;); 

40 end if 

41 end 



nism, whereby a thread schedule is recorded and later replayed to 
reconstruct the same state. As noted in [ 20] , replays may not result 
in identical states due to other sources of non-determinism (e.g., 
environment, non-deterministic calls, etc.). Our current implemen- 
tation deals with these issues by enforcing a deterministic input at 
all these non-deterministic points through bytecode instrumenta- 
tion. 

We instrument the target program separately for each subset of 
variables being tracked. For a fixed (v, i) value, the enumeration 
algorithm iteratively explores the schedules with context bound 
0, 1, . . . , c (c is the maximum desired context-bound value). While 
enumerating schedules for context bound currBound, schedules 
are generated for context-bound currBound + 1. Our algorithm 
is very similar to that presented in fil^l . with the following differ- 
ences: 



1. The instrumented program points include memory accesses to 
the variables being tracked, and not just explicit synchroniza- 
tion points. As we show later, variable-bounding allows us to 
do this without significant increase in running times. Each in- 
strumented program point yields control to our scheduler. 

2. When a thread yields control to the scheduler (line 29), the 
address of the currently accessed variable is compared with the 
set of variables being tracked (variable bounding). Recall that it 
is possible that even though the variable access is instrumented, 
the accessed variable does not belong to the set of variables 
being tracked. This can happen either due to the imprecision 
of the static alias analysis or in cases where multiple variables 
are allocated by the same heap-allocation statement. If the ac- 
cessed variable belongs to the set of variables being tracked, 
the priority of the executed thread is re-assigned, as discussed 
in Sectionf4] 

We also instrument all entries and exits from synchronized 
blocks, calls to wait and notify, and other thread library 
functions like Thread, create, Thread, join, Thread. yield, 
Thread . suspend and Thread. resume. We replace all synchro- 
nization function calls with calls to the appropriate scheduler 
functions, through instrumentation. The scheduler function emu- 
lates the requested operation and returns to the enumeration algo- 
rithm (at line 29). The enumeration algorithm then selects the 
highest-priority active thread (which could have changed due to 
the synchronization operation) and executes it. For illustration, Fig- 
ure fTTJ shows the scheduler's emulation functions for waitO and 
notifyO. All calls to waitO and notify () in the target pro- 
gram are replaced with calls to wait_s() and not if y_s() respec- 
tively. 

void wait_s(cond, mutex) { 
curthread.waitingOn = cond; 
curthread. status = BLOCKED; 
add_to_blocked_threads (curthread) ; 
wakeup_threads_blocked_on (mutex) ; 
return to scheduler 

} 

void signal_s (cond, mutex) { 

wakeup_threads_blocked_on (cond) ; 
return to scheduler 

} 



Figure 11. The scheduler's wait () and notify () functions 

All program instructions, where one of the variables being 
tracked is accessed, are also instrumented with a call to sched- 
uler function varaccess (). The varaccess () function simply 
returns to the enumeration algorithm (at line 29). The instruc- 
tions that could potentially access a tracked variable are identified 
using static alias analysis. 

Here, we also point out that our definition of context-bound 
differs from previous work 1 19] in a subtle way. While the previous 
work counts all pre-emptive context switches towards the context- 
bound, we only count the pre-emptive context switches that violate 
the current priority order. For example, in our scheme, it is possible 
for a low-priority thread to be pre-empted in favor of a high-priority 
thread after thread creation, even at c = 0. We do not count such 
pre-emptions towards the context-bound. 

Usually, priority-based schemes suffer from issues like priority 
inversion and starvation. Because we require all our threads to be 
terminating, this is not an issue in our implementation. A priority- 
based scheme also violates any assumptions of strong fairness Ql 
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which says that every thread will eventually be run. As also noted 
in 12011 . many programs implicitly make this assumption. For ex- 
ample, while-flags (or spin-loops) are a common synchronization 
construct that assume strong fairness. These loops will never ter- 
minate if the thread that is supposed to set the condition of the 
loop starves. CHESS avoids this situation by assuming that a thread 
yields when it is not able to make progress, and assigning lower 
priority to threads calling thread_yield() . In our enumeration 
scheme, lowering the priority of a thread on a call to yieldO may 
cause certain schedules to never get enumerated, because unlike 
CHESS, we enumerate only a small set of priority orders among 
threads (thread-bounding). To guard against the possibility of in- 
finite loops, we lower the priority of a thread if we observe that 
thread to yieldO more than a 100 times. This threshold avoids 
infinite loops, and yet is reasonably large to not cause interference 
with our thread-bounding algorithm. 

Similar to CHESS 1 20], we use happens-before relations to con- 
struct a happens-before graph to prune the schedules. The happens- 
before graph characterizes the partial order of related operations 
in a program execution. The nodes of the happens before graph 
are the executed instructions. A happens-before directed edge is 
drawn between two instructions iff the two instructions execute in 
different threads, the first instruction executes before the second in- 
struction in the given schedule, and the two instructions access the 
same variable of which at-least one access is a write. The prun- 
ing is based on the observation that two schedules with identical 
happens-before graphs result in the same program state. For a given 
variable set, if one schedule has an identical happens-before graph 
to another previously enumerated schedule, this schedule (and all 
its derivative schedules) need not be enumerated. Pruning is not 
performed across distinct variable sets. We note that because our 
thread-bounding algorithm is randomized, our exhaustive search 
algorithm is not strictly exhaustive. But as stated in Lemma 147X1 
the probability that we have not exhausted the search space can be 
made arbitrarily small by executing a sufficiently large number of 
random priority orders. 

We also implement a randomized testing algorithm in 
RankChecker to test variable and thread bounding. The random- 
ized algorithm simply picks a set of v variables (globals and heap- 
allocation statements) randomly, and then picks priority change 
points at accesses to these variables. The values of the maxi- 
mum number of accesses, k qi ,. . . ,k qQ , to variables, qi ,. . . ,qQ re- 
spectively, are estimated by running the program without priority 
scheduling multiple times and counting the average number of ac- 
cesses to each variable in these runs. The priority change points are 
picked uniformly over the interval [1, k qi \. Our randomized algo- 
rithm is modeled after PCT's depth-bounding. The only difference 
between our algorithm and PCT is in the assignment of priorities. 
PCT generates a set of random priority orders, such that each thread 
gets to be the lowest priority thread in at least one of the priority 
orders. Also, on a priority change point, PCT decreases the prior- 
ity of the current thread to become lower than the priority of all 
currently-executing threads. Our priority orders are instead chosen 
using the thread-bounding algorithm given in Section|4] 



6. Experimental Results 

We perform experiments to answer the following questions: 

• What are the typical values of variable-bound and thread-bound 
in common concurrency bugs? 

• What is the runtime improvement due to variable bounding? 

• For exhaustive search strategy, do variable and thread bounding 
improve the number of executions required to expose a bug? 



• For randomized search strategy, do variable and thread bound- 
ing improve the number of executions required to expose a bug? 

We picked a variety of small and large Java programs and one 
C# program as test programs to evaluate our algorithms. The de- 
tails of these programs are given in Table Q] The first 13 programs 
are from the ConTest Concurrency Benchmark Suite | 9]. All these 
programs contain a concurrency bug. The next 8 benchmarks are 
multi-threaded Java programs commonly used to evaluate concur- 
rency testing and verification tools. Some of these programs contain 
bugs. The last program (RegionOwnership) is a C# program con- 
taining a reasonably complex concurrency bug. This program has 
been previously analyzed using CHESS [g]. As we discuss later, 
we have also implemented variable and thread bounding in the 
CHESS tool to test C# programs. We report our experiences with 
variable bounding on the RegionOwnership benchmark. Within a 
variable and thread bound, we further rank our schedules based on 
the loop iteration number (recall Section[2]l. For exhaustive search 
experiments, while choosing our variable set, we give priority to 
shared variables, i.e., variables known to be shared are chosen be- 
fore other variables. A variable is known to be shared if in one of 
the preparatory runs, we found a variable being accessed by at least 
two threads. 

We ran RankChecker on the programs containing known bugs 
with variable bounding to check the bug characteristics. Table [2] 
lists the (c, v, t) values at which these bugs were uncovered using 
the exhaustive algorithm. We found that all these bugs were c < 
2, v < 1, t = 2 bugs. We also surveyed past papers on studying 
concurrency bugs and bug databases of popular applications, to 
study the bugs reported in them. We found that all these bugs were 
of type c < 2,v < 2,t = 2. 

We provide pseudo-code of the c = 2, v = l,t = 2 bug found 
in AllocationVector in Figure [T2l 



Block b = FindFreeBlockO ; 

first context switch 



Block b = FindFreeBlockO ; 
ASSERT(IsBlockFree(b)) ; 
MarkBlockAllocated(b) ; 

second context switch 



FreeAllBlocksO ; 



ASSERT (IsBlockFree(b)) ; IFAILS! 
MarkBlockAllocated(b) ; 
FreeAllBlocksO ; 



Figure 12. Pseudo-code showing the c = 2,v = l,t — 2 
bug in AllocationVector. The routines FindFreeBlockO, 
MarkBlockAllocatedO , and IsBlockFreeO are all synchro- 
nized (i.e., protected by a monitor lock). FindFreeBlockO 
searches a global vector to find an unallocated block. 
MarkBlockAllocatedO sets a flag in block b and 
IsBlockFreeO checks that flag. 

We next discuss the improvements in running time due to vari- 
able bounding. Table[3]shows our results on some of our Java pro- 
grams. The other Java programs were too small to show any mean- 
ingful improvements. The runtime statistics have been averaged 
over several runs of the programs. With variable bounding, there 
is up to lOOx improvement in the runtime cost of instrumentation. 
The runtime improvement depends on the proportion of compu- 
tation and I/O in the test program. Variable bounding results in 
improvement because only program statements identified by alias 
analysis as potential accesses to our set of tracked variables need to 
be instrumented. The performance of an instrumented run is now 
comparable to that of a native run, which makes it practical to im- 
plement systematic testing algorithms where all variables are con- 
sidered as potential pre-emptions points. (The native run is some- 
times slower than the instrumented run; this happens due to the 
overhead of process creation in the native run which does not exist 
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Benchmark | SLOC | # Threads | # Variables | Bug? | Description 



ConTest Benchmarks 



MergeSort 


376 


100 


564 


Yes 


Sorts a set of integers using mergesort 


Producer Consumer 


279 


7 


61 


Yes 


Simulates producer-consumer behavior 


LinkedList 


420 


3 


60 


Yes 


LinkedList's implementation with test-harness 


BubbleSort 


365 


9 


54 


Yes 


Sorts a set of integers using bubblesort 


BubbleSort2 


129 


101 


105 


Yes 


Sorts a set of integers using bubblesort 


Piper 


210 


9 


33 


Yes 


Manages airline reservations 


Allocation Vector 


288 


3 


4010 


Yes 


Manages free and allocated blocks in a vector 


BufWriter 


259 


5 


27 


Yes 


Reads and writes to a buffer concurrently 


PingPong 


276 


18 


25 


Yes 


Simulates the behavior of ping-pong game 


Manager 


190 


6 


25 


Yes 


Manages free and allocated blocks 


MergeSortBug 


258 


29 


52 


Yes 


Sorts a set of integers using mergesort 


Account 


169 


3 


26 


Yes 


Manages a bank account 


AirLineTickets 


99 


11 


5 


Yes 


Simulates selling of airline tickets 


Java's Library in JDK 1.4.2 


HashSet 


7086 


200 


4777 


Yes 


Thread-safe implementation of HashSet 


TreeSet 


7532 


200 


6140 


Yes 


Thread-safe implementation of TreeSet 


Other Java Benchmarks 


Cache4j 


3897 


12 


251,469 


No 


Cache implementation for Java objects 


Molydn 


1410 


8 


121,371 


No 


Benchmark from Java Grande Forum 


Montecarlo 


3630 


8 


452,700 


No 


Benchmark from Java Grande Forum 


TSP 


719 


18 


84 


No 


Travelling Sales Problem's implementation 


Blocking Queue 


57 


3 


38,828 


No 


Tests BlockingQueue library implementation 


Sor 


17,738 


6 


53 


No 


Successive Order Relaxation method's implementation 


C# Benchmark 


RegionOwnership 


1500 


5 


41 


Yes 


Manages coordination for objects 
communicating using async calls 



Table 1. Test programs and their details. 



Program Name 


BCI 


var sites 


# of accesses 


Native time(sec) 


vO(sec) 


vl(sec) 


v2(sec) 


v-aZZ(sec) 


\-allNl 


Cache4j 


231.1m 


101 


21.4m 


0.34 


0.47 


1.23 


2.76 


26.38 


21.3 


Molydn 


2.33b 


209 


1.4b 


0.39 


3.15 


11.59 


19.86 


1239.76 


106.3 


Montecarlo 


577.7m 


235 


446.96m 


0.48 


1.94 


4.74 


5.21 


323.12 


68.08 


TSP 


8.76b 


65 


2.55b 


4.2 


4.23 


32.64 


109.72 


1180.28 


36.15 


Blocking Queue 


3.4m 


13 


0.65m 


0.17 


0.18 


0.194 


0.202 


1.386 


7.14 


Sor 


0.2m 


46 


0.68m 


0.07 


0.25 


0.249 


0.348 


0.392 


1.57 


HashSet 


157.4k 


137 


16889 


0.07 


0.0775 


0.0901 


0.0976 


0.2687 


2.98 


TreeSet 


113k 


146 


16273 


0.69 


0.078 


0.089 


0.09 


0.259 


2.91 



Table 3. The different columns in this table represents the name of the program, the (average) number of byte code instructions executed 
by the program, total number of different instrumentation sites, which includes heap-allocation statements and global variables, total number 
of accesses, native execution time, the average amount of time taken for one execution when we are tracking 0, 1, 2, and all variables, 
respectively, and the last column represents the ratio of the v-all and vl columns. 



in our instrumented run.). This is a significant improvement over 
previous work, where only synchronization operations have been 
considered as potential pre-emption points p , I20TI . 

Previously, a tool called RaceFuzzer 12411 reported a c = 1, v = 
1, t = 2 concurrency bug (data race) in cache4j. Our tool could 
not find this bug even after exhaustively enumerating all schedules 
uptoc < 2,n < 2,t = 2. On deeper inspection, we found that 
the bug did not exist. It turned out that RaceFuzzer had generated 
a false bug report due to an error in the modelling of the semantics 
of the Java interrupt exception in the tool. We reported this to the 
author of RaceFuzzer 1 24] , and he did not object to our findings. 
Because RankChecker actually runs a schedule to try and trigger 
assertion failures, a bug report and the associated schedule reported 
by it also serve as a proof of the bug's existence. 



Variable and Thread Bounding in CHESS 

We further validate the effectiveness of variable and thread bound- 
ing in practice by implementing it inside CHESS 1 20] and testing 
it on C# benchmarks that were previously used with CHESS |8]. 
However, we did not have an alias analysis readily available for 
C#, thus, we only implemented a simple form of variable bound- 
ing that works as follows. Let VT-CHESS refer to our extension of 
CHESS with variable bounding. Suppose VT-CHESS is executed 
on program P with variable bound v and pre-emption bound c. If 
v > c then VT-CHESS behaves exactly like CHESS. When v < c, 
then during an execution of P, VT-CHESS records the shared vari- 
ables accessed just before the first v pre-emptions in the execution. 
Subsequent pre-emptions (v + 1 th to c th ) are constrained to occur 
only after an access of one of these v variables. In other words, 
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Program Name 


Schedules 
Explored 


(c,«,t) 


Merge Sort 


651 


(1,1,2) 


Producer Consumer 


1 


(0,0,2) 


LinkedList 


23 


(1,1,2) 


BubbleSort 


1 


(0,0,2) 


BubbleSort2 


2 


(0,0,2) 


Piper 


64 


(1,1,2) 


Allocation Vector 


113 


(2, 1,2) 


rsui writer 


1Z 


(U, U, I) 


PiTurPntny 


234 




Manager 


33 


(1,1,2) 


MergeS ortBug 


89 


(1,1,2) 


Account 


19 


(1,1,2) 


AirLineTickets 


2 


(0,0,2) 


HashSet 


813 


(1,1,2) 


TreeSet 


813 


(1,1,2) 



Table 2. The different columns in this table represents the name 
of the program, the number of schedules explored till we found the 
first bug and tuple (c, v, t) at which the bug occurs. 

the v variables for variable bounding are picked up dynamically at 
run-time. 

The deepest reported bug found using CHESS is in a program 
called RegionOwnership. It is a C# library that manages con- 
currency and coordination for objects communicating via asyn- 
chronous procedure calls. The library is accompanied by a single 
test case comprising of a one-producer one-consumer system. The 
library is 1500 lines of code, and an execution access a synchro- 
nization variable at most 280 times. The test reveals a bug that re- 
quires at least 3 pre-emptions. 

Table [4] shows the number of executions and time taken before 
VT-CHESS either reported a bug or finished exploring all behaviors 
under the given bound. We used c = 3, t = 2 in all invocations of 
VT-CHESS. VT-CHESS was able to find the bug about 6 times 
faster then CHESS while using a variable bound of 2. Using a 
variable bound of 1 does not expose the bug, but Table [4] shows 
a further reduction in search space when this bound is imposed. 





Bug found? 


# Executions 


Time (sec) 


No VB,t = 


2 


Yes 


132507 


6897.3 


v = 2,t = 


2 


Yes 


47248 


1224.4 


v = l,t = 


2 


No 


30437 


581.0 



Table 4. Experiments with the RegionOwnership benchmark. 

Variable and Thread Bounding in Randomized Algorithms 

All the bugs found in our test programs, except RegionOwnership, 
were of type v < 1. As seen in Tables [5] and [4] variable bounding 
improves both runtime and the number of schedules explored while 
systematically testing concurrent programs. To further study the 
effect on bugs with higher v, t values, we modified one of our test 
programs such that it had a bug of the required type. We compared 
RankChecker on this benchmark with and without variable/thread 
bounding for bugs with different c, v,t characteristics. Table [5] 
presents our results. 

As expected, the time required to find the bug decreases dramat- 
ically with variable bounding. The number of executions required 
to find a v = bug is roughly the same with and without vari- 
able bounding, but increases with the thread-bound of the bug. The 
number of executions required to find the bug improves with vari- 
able bounding at v > 1, for the reasons discussed in Section[5] 



Bug Type 

(c, v, t) 


With v,t b 
# Executions 


junding 
Time (sec) 


Without v,t 
# Executions 


bounding 
Time (sec) 


(0,0,2) 


3.1 


10.9 


2.7 


768.9 


(0,0,3) 


3.7 


12.1 


3.5 


1010.7 


(0,0,4) 


4.9 


14.9 


3.8 


1101.7 


(1,1,2) 


1636.2 


6409.9 




TimedOut 


(1,1,3) 


4889 


20371.2 




TimedOut 


(2,1,2) 


28121 


112950.3 




TimedOut 



Table 5. This table represents the average number of executions and time 
required to capture a bug of type (c, v, t) introduced in the Montecarlo 
benchmark. Without variable bounding, our tool timed out after executing 
for more than 3 days for c > 1, V > 1, without finding the bug. 



7. Related Work 

There is a large body of work on static @, l2lll and dynamic 0, 
UH, [O, |22|, I23II techniques to uncover concurrency bugs. While 
many of these techniques are very effective and have uncovered 
a variety of previously-unknown bugs in well-tested software, the 
current dominant practice in the software industry still remains 
stress testing. There are a few important likely reasons for this 
(apart from plain inertia): 

1. Plain testing is more natural. 

2. Most tools target a small class of bugs. For example, some tools 
target only dataraces, others target only atomicity-violation 
bugs, and yet others target only deadlocks. It is confusing for 
a developer to understand the function of each tool and apply 
them separately. 

3. Many tools have false positives. Spending time and energy on a 
false-positive bug report is annoying and counter-productive. 

4. Many tools require source-level annotations. Many tools rely 
on certain programming disciplines, e.g., all shared memory 
accesses must be protected by a lock. At places where the 
programmer deliberately violates this discipline, source code 
annotations are required. Most developers are usually reluctant 
to annotate their source code for better testing. 

5. High-Runtime, Low Coverage: Many tools have a high runtime 
cost, and provide low coverage. 

Model checking approaches 1 4L [ToL ITll . IT^L [2CL [2^1 are closer to 
the familiar idiom of testing. Our model checker targets all types 
of concurrency bugs, has no false positives, and requires no source- 
level annotations. We address the state explosion problem by rank- 
ing the schedules using v, t to maximize coverage in the first few 
schedule executions. Previous approaches have reduced this search 
space by either limiting context switches only at synchronization 
operations |20] (resulting in potential false negatives), or using an 
offline memory trace of the program to identify and rank unserial- 
izable interleavings 12211 (primarily to identify atomicity-violation 
bugs). We believe that variable and thread bounding are more gen- 
eral methods of ranking (or reducing) the search space. 

CHESS 1 20] uses iterative context bounding to rank schedules. 
We borrow many ideas from CHESS, including iterative context 
bounding fl9h . using a happens-before graph for stateless model 
checking, and fair scheduling. We provide further ranking of sched- 
ules to uncover most bugs with a smaller number of schedules. 
While we consider all shared memory accesses as potential context- 
switch points, CHESS only allows pre-emptible context switches 
at explicit synchronization primitives. This restriction (first used in 
ExitBlock |4]) is justified if all shared memory accesses are pro- 
tected by explicit synchronization (e.g., lock/unlock). CHESS 
relies on a data-race detector to separately check this property. Even 
if we assume a precise and efficient data-race detector, this ap- 
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proach still overlooks "adhoc" synchronization that do not involve 
known synchronization primitives l27h . 

VeriSoft 00] also uses an exploration strategy to model- 
check "distributed" systems using a state-less search (i.e., storage 
of previously-visited states are not required). Verisoft uses partial- 
order methods to reduce redundancy, similar to happens-before 
graph pruning used in CHESS and in our tool. S. Stoller 1231 uses 
a similar approach to model-check multi-threaded distributed Java 
programs. We believe that variable and thread bounding ideas are 
equally relevant to these model checking approaches as well. 

Our work is complementary to race-detection tools IT8l . l2ll . l23l 
I24ll28h . deadlock-detection tools 01, and atomicity-violation de- 
tection tools 1171 12211 . We do not focus on a particular class of 
bugs, but rather drive a model checker into exploring interesting 
schedules that are likely to uncover all these bugs early. In prac- 
tice, small v,t values uncover most data races, deadlocks, and 
atomicity-violation bugs. 

8. Conclusion 

We present variable and thread bounding to rank thread sched- 
ules for systematic testing of concurrent programs. Through ex- 
periments on a variety of Java and C# programs, we find that the 
ranking significantly aids early discovery of common bugs. 
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